Extraction of V-N-Collocations from Text Corpora: A Feasibility Study for German

نویسنده

  • Elizabeth Breidt
چکیده

The usefulness of a statistical approach suggested by Church and Hanks (1989) is evaluated for the extraction of verb-noun (V-N) collocations from German text corpora. Some motivations for the extraction of V-N collocations from corpora are given and a couple of differences concerning the German language are mentioned that have implications on the applicability of extraction methods developed for English. We present precision and recall results for V-N collocations with support verbs and discuss the consequences for further work on the extraction of collocations from German corpora. Depending on the goal to be achieved, emphasis can be put on a high recall for lexicographic purposes or on high precision for automatic lexical acquisition, in each case leading to a decrease of the corresponding other variable. Low recall can still be acceptable if very large corpora (i.e. 50 100 miUion words) are available or if corpora are used for special domains in addition to the data found in machine readable (collocation) dictionaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acquisition of Phraseological Units from Linguistically Interpreted Corpora a Case Study on German Pp-verb Collocations

In this paper, we show that accessibility of syntactic information eases collocation extraction from corpora, and supports identi cation of lexical and structural restrictions related to collocations. For collocation identi cation we use a corpus that is automatically annotated applying a part-of-speech tagger and a phrase chunker.

متن کامل

Towards a corpus-based dictionary of German noun-verb collocations

We 1 describe our attempts to automatically extract raw material for a dictionary of German noun-verb collocations from large corpora of newspaper text. Such a dictionary should be about collocations and it should include a description of their linguistic properties, rather than listing the mere lexical cooccurrence. Since most statistical collocation nding tools do not provide other than lexic...

متن کامل

Experiments on Candidate Data for Collocation Extraction

The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used as gold standard. Results are available for adjective+noun pairs, which proved to be a comparatively easy extraction task. We plan to extend the evaluation to other types of collocations (e.g., PP+verb pairs).

متن کامل

Towards a Lexicon-Grammar of Polish: Extraction of Verbo-Nominal Collocations from Corpora

In the paper we present a contribution to the SyntLex longterm-project aiming at a lexicon-grammar for Polish. A corpus-based method is presented for computer-assisted improvement or/and verification of verbo-nominal lexicongrammars (in application to Polish). Feasibility study.

متن کامل

FipsCoView: On-line Visualisation of Collocations Extracted from Multilingual Parallel Corpora

We introduce FipsCoView, an on-line interface for dictionary-like visualisation of collocations detected from parallel corpora using a syntactically-informed extraction method.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9603006  شماره 

صفحات  -

تاریخ انتشار 1993